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[57] ABSTRACT 

A computer-simulated cortical network computes the 
visibility of shifts in the direction of movement and 
computes: 1) the magnitude of the position difference 
between the test and background patterns, 2) localized 
contrast differences at different spatial scales analyzed 
by computing temporal gradients of the difference and 
sum of the outputs of paired even- and odd-symmetric 
bandpass filters convolved with the input pattern and 3) 
using global processes that pool the output from paired 
even- and odd-symmetric simple and complex cells 
across the spatial extent of the background frame of 
reference to determine the direction a test pattern 
moved relative to a textured background. The direction 
of movement of an object in the field of view of a ro- 
botic vision system is detected in accordance with non- 
linear Gabor function algorithms. The movement of 
objects relative to their background is used to infer the 
3-dimensional structure and motion of object surfaces. 

27 Claims, 3 Drawing Sheets 
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METHOD AND APPARATUS FOR PREDICTING 
THE DIRECTION OF MOVEMENT IN MACHINE 
VISION 

5 

ORIGIN ON THE INVENTION 

The invention described herein was made in the per- 
formance of work under a NASA contract, and is sub- 
ject to the provisions of Public Law 96-517 (35 USC 
202) in which the Contractor has elected not to retain 1 
title. 

TECHNICAL FIELD 

The invention relates to methods and apparatus em- 
ployed in machine vision and, more particularly, to a 1 
machine vision system for determining movement in a 
2-dimensional field of view comprising, video camera 
means for viewing the field of view and producing 
2-dimensional binary representations of the pattern 20 
thereof at consecutive times ti and tr, image enhance- 
ment means for receiving the 2-dimensional binary rep- 
resentations of the field of view and for producing en- 
hanced 2-dimensional binary representations thereof; 
computational means for producing smoothed versions 25 
of the enhanced 2-dimensional binary representations, 
the computational means including means for filtering 
binary data comprising the enhanced 2-dimensional 
binary representations to provide the spatial gradients 
of components thereof comprising a background frame 30 
of reference and components identified with objects 
moving against the background frame of reference; 
means for producing the temporal gradients of the en- 
hanced 2-dimensional binary representations; and, sens- 
ing means for comparing the spatial and temporal gradi- 35 
ents at the consecutive times to one another to deter- 
mine any motion parallax existing in the field of view, 
whereby movement of objects in the field of view is 
determined. 

In the preferred embodiment, the outputs from paired 40 
Gabor filters summed across the background frame of 
reference are used by the sensing means to determine 
the direction of movement. 

The preferred embodiment additionally includes, 
means for encoding the image intensity of the pattern of 45 
the field of view by separable spatial and temporal fil- 
ters consisting of paired even-and odd-symmetric sim- 
ple cells in quadrature phase; for each combination of 
the paired even- and odd-symmetric filters at each spa- 
tial scale being analyzed, means for passing the input 50 
signal through the spatial and temporal filters to pro- 
duce four separable responses wherein the output from 
each spatial filter is processed during two different time 
intervals by a low-pass temporal filter and a bandpass 
temporal filter; and, means for taking the sums and 55 
differences of the outputs from the paired filters to 
produce spatiotemporally-oriented nonlinear responses 
that are selective for the direction of any motion. 

In the preferred embodiment the separable spatial and 
temporal filters comprise: 60 

an even-symmetric Gabor function, F £$, described 
as, 

f r ES(fx,o r2 )= cos (2ir/x)*-< x -*’) 2 /2 cr2 

65 

and, 

an odd-symmetric Gabor function, F os, described as, 


Fostf*.cr 2 )=sin (2 -nfx)*?' (*~*>)2/2cr2 


, where f corresponds to the spatial-frequency of 
the pattern, x corresponds to the horizontal spatial 
position being examined, \ 0 corresponds to the 
video camera means’s fixation point, and o 2 corre- 
sponds to the variability within the pattern’s spatial 
period in locating the position of the most salient 
contrast difference, x 0 . 

Additionally, the low-pass temporal filter Fes is de- 
scribed by, 


x=x 0 +B/l kjCb " + k /h 
x=x 0 -B/2 C b n + C 0 B 


kjC," + kfl 
C t n + Co 


F*q S at time *2 


and the bandpass temporal filter F os is described by, 


x=Xq+B/ 2 kj c b " + kj , 
x=x 0 -B/ 2 Q," + C 0 “ 


k,C,” + k fi 
C ,» + C 0 


5 at time i\ 


, where C t corresponds to the contrast of a test pattern, 
k ft corresponds to the contrast threshold for a test fre- 
quency, Cb corresponds to the contrast of the pattern of 
the background frame of reference, k fb corresponds to 
the contrast threshold for the spatial frequencies of the 
background frame of reference, C 0 , that depends on the 
temporal frequency, is a constant that corresponds to 
the signal-to-noise ratio used when detecting left-right 
movement of a test pattern relative to of the back- 
ground frame of reference, n corresponds to the slope of 
the contrast response function, m is a constant (usually 
1 but may be 2 as a result of rectification which only 
occurs at high temporal frequencies), B corresponds to 
the spatial period of the background frame of reference, 
x 0 corresponds to a zero-crossing or contrast difference 
used as the point of reference to judge the direction the 
test pattern moved between two pattern presentations 
at times ti and t2, and k/, k/are changeable constants for 
the gain of the contrast sensitivity which may be 
changed by feedback. 

BACKGROUND ART 

In the field of robotics, and the like, it is often desir- 
able to provide the equipment with an optical input 
device which will allow the equipment to “see” what it 
is doing and make adaptive control decisions based 
thereon. Such “machine vision” applications can gener- 
ally be classified in one of two general categories — vi- 
sion to control movement of a portion of the device 
with respect to a target area and vision to control move- 
ment of the device itself with respect to its surround- 
ings. A robotic assembly device moving an arm with 
respect to an article being assembled thereby is an exam- 
ple of the first type while a robotic vehicle moving 
across an area is an example of the latter. Determining 
the direction an object moves relative to the observer is 
used to disambiguate objects from the background 
frame of reference. To navigate effectively, the optical 
flow of objects in the environment, relative to the ob- 
server, is used by both human and computer vision 
systems. 

In machine vision it is frequently necessary to deter- 
mine the direction of motion of an object in the field of 
view with reference to the background. This is espe- 
cially necessary for machines that are themselves mov- 
ing, such as planetary rovers, and the like. This capabil- 
ity is required because such movement is used to infer 
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the 3 -dimensional structure and motion of object sur- 
faces. Apparent movement is perceived when an object 
appears at one spatial position and then reappears at a 
second nearby spatial position a short time later. For 
example, when two similar lights flash asynchronously 5 
against a dark background at night, the observer “sees” 
an apparent movement of one light. The shift in the 
spatial position of contrast differences (i.e. light against 
a dark background) over a short time interval induces 
the perception of motion. The direction an object is 10 
perceived to move is judged relative to a background 
frame of reference. Figure-ground segmentation pre- 
cedes the determination of the direction of movement. 
When navigating through the environment, objects are 
perceived to move relative to a textured stationary 15 
background. Both the spatiotemporal characteristics of 
the object and the background frame of reference are 
used to determine the perceived direction of movement. 

Prior art machine vision systems have been designed 
in a machine-like fashion; that is, they take what 20 
amounts to a “snapshot” of the scene, delay, take a 
second snapshot, and then compare the two to see what 
changes have taken place. From those changes, the 
appropriate movement calculations are made. A human 
operator performing the same control functions, on the 25 
other hand, takes a different approach. The human’s 
predictive approach is one wherein the scene is viewed 
in real-time and the movement is divided into relevant 
and non-relevant areas. For example, when driving an 
automobile, the driver sees thing immediately in front of 30 
the vehicle (the foreground), at a median distance (a 
middle active region), and in the far distance (the back- 
ground). When maneuvering the vehicle along the 
streets, the driver is only interested in the median dis- 
tance as it provides the information which is relevant to 35 
the movement of the vehicle through the streets. The 
background is irrelevant except as it relates to move- 
ment towards an ultimate goal. Likewise, the fore- 
ground area immediately in front of the vehicle relates 
only to the avoidance of sudden obstacles. Thus, the 40 
driver rejects data from the field of view that does not 
relate to the immediate problem being solved, i.e. steer- 
ing guidance. There is also a constant prediction of 
future movement and correction for changes from the 
prediction. In this way, the driver is able to quickly and 45 
accurately perform the necessary steering function from 
the visual data as input and processed. At present, there 
is no machine vision system which operates in the same 
manner as a human operator. 

Visual psychophysics research has uncovered several 
important properties that determine the direction a 
simple object is seen to move when viewed by a human 
operator. The contrast, position or spatial-phase, the 
spatial frequencies, and the temporal duration that char- 
acterize the test object and its background affect the 55 
direction an object is perceived to move relative to its 
background. When identifying patterns where a test 
sinewave grating is shifted in position relative to a sta- 
tionary textured background composed of single and 
multiple spatial-frequency components, the visibility of 60 
left-right movement was found to be predicted by spa- 
tially-located paired Gabor filters (paired even- and 
odd-symmetric filters optimally tuned to a 90° phase 
difference) summed across the background reference 
frame. The problem is to apply these discoveries so as to 65 
provide a similar ability to determine the direction of 
object movement in machine vision. The solution is to 
employ a computer-based, real-time system to process 
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the image signals through paired Gabor filters, using the 
sums and differences to determine direction and, 
thereby, emulate the human response in a machine vi- 
sion environment. 

STATEMENT OF THE INVENTION 

Accordingly, it is an object of this invention to pro- 
vide a machine vision system which provides control 
information closely related to the information which 
would be processed by a human operator under similar 
circumstances. 

It is another object of this invention to provide a 
machine vision control system which can control a 
robotic type device, or the like, in much the same man- 
ner as a human operator would manually perform the 
same control functions under similar circumstances. 

It is yet another object of this invention to provide a 
machine vision control system for controlling robotic 
type devices, or the like, in a manner which eliminates 
unnecessary data from the computations so that the 
computations performed can be accomplished in real- 
time. 

Other objects and benefits of this invention will be- 
come apparent from the description which follows 
hereinafter when taken in conjunction with the drawing 
figures which accompany it. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a simplified drawing of a typical scene as 
viewed by a machine vision system according to the 
present invention showing how the scene contains nec- 
essary and useful information within a horizontal strip 
which comprises a middle active region and contains 
unnecessary information in background and foreground 
areas. 

FIG. 2 is a simplified drawing showing how the fea- 
tures in the middle active section of FIG. 1 change 
relationship as a function of their distance from the 
viewer as the viewer moves past them. 

FIG. 3 is a simplified drawing further showing how 
the features in the middle active section of FIG. 1 
change relationship as a function of their distance from 
the viewer as the viewer moves past them and how 
foreground items on non-interest appear and disappear 
from the scene. 

FIG. 4 is a simplified drawing showing how the fea- 
tures in the middle active section of FIG. 1 change 
relationship as a function of their distance from the 
viewer as the viewer moves past them at a later point in 
50 time from FIG. 3. 

FIG. 5 is a functional block diagram of a machine 
vision system according to the present invention as 
mounted and tested on a planetary rover. 

FIG. 6 is a flowchart of logic portions of the machine 
vision system of FIG. 5. 

DETAILED DESCRIPTION OF THE 
INVENTION 

This invention is built around a computer-simulated 
cortical network which computes the visibility of shifts 
in the direction of movement and computes: 1) the mag- 
nitude of the position difference between test and back- 
ground patterns, 2) localized contrast differences at 
different spatial scales analyzed by computing temporal 
gradients of the difference and sum of the outputs of 
paired even- and odd-symmetric bandpass filters con- 
volved with the input pattern and 3) using global pro- 
cesses that pool the output from paired even- and odd- 
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symmetric simple and complex cells across the spatial 
extent of the background frame of reference to deter- 
mine the direction a test pattern moved relative to a 
textured background. Magnocellular pathways are used 
to discriminate the direction of movement. Since mag- 5 
nocellular pathways are used to discriminate the direc- 
tion of movement, this task is not affected by small 
pattern changes such as jitter, short presentations, blur- 
ring, and different background contrasts that result 
when the veiling illumination in a scene changes. For 10 
example, the direction of movement of an object in the 
field of view of a robotic vision system is detected by 
encoding image intensity by passing the input image 
through separable spatiotemporal filters consisting of a 
paired even- and odd-symmetric cells tuned to a 90° 15 
spatial-phase difference so as to produce paired signals 
processed through a low pass and a bandpass temporal 
filter whereby the sums and differences produce non- 
linear responses indicating direction of motion. The 
processing is accomplished by a neural network com- 20 
puter in accordance with non-linear Gabor function 
algorithms. Before beginning a detailed description of 
the invention itself, however, a description of the basis 
for human emulation incorporated therein will be pro- 
vided. 25 

As mentioned above, in human vision and the inter- 
pretation thereof, the movement of objects relative to 
their background is used to infer the 3-dimensional 
structure and motion of object surfaces. Apparent 
movement is perceived by the observer when an object 30 
appears at one spatial position and then appears at a 
second nearby spatial position a short time later. The 
shift in the spatial position of contrast differences over a 
short time interval induces the perception of movement. 
The direction an object is perceived to move is judged 35 
relative to the background frame of reference and fig- 
ure-ground segmentation precedes the determination of 
the direction of movement. When navigating through 
the environment, objects are perceived to move relative 
to a textured stationary background. Both the spatio- 40 
temporal characteristics of the object and the back- 
ground frame of reference are used to determine the 
perceived direction of movement. Studying the visibil- 
ity of shifts in the direction a pattern moves relative to 
a background using discrete sequential flashed stimuli 45 
has the advantage of providing a more ‘analytic’ stimu- 
lus with which to separately manipulate the spatial and 
temporal aspects of movement. Optical flow, an impor- 
tant cue for navigation, is determined by measuring the 
rate of movement over time, divided by the rate of 50 
movement across space. 

The optical phenomenon upon which the invention is 
based is shown in simplified form in FIGS. 1-4. As a 
scene 10 is viewed, there is a natural tendency for the 
scene to fall into a foreground area 12 at the bottom, a 55 
middle region 14 which exists as a horizontal strip 
through the middle of the scene 10, and a background 
area 16 at the top. A human viewer will typically raise 
or lower his/her head in order to place the scene into 
this arrangement; that is, normally a driver, pilot, pedes- 60 
trian, etc. will look towards an immediate goal in the 
middle distance in order to perform the primary guid- 
ance function and will only glance down to the fore- 
ground and up to the background periodically to per- 
form the functions associated therewith. In FIG. 1, the 65 
middle area 14 contains a building 18 and a tree 20. The 
distance between the viewer and the building 18 and 
tree 20 are not readily apparent from a single view. 
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Likewise, the distance between the building 18 and the 
tree 20 are not readily apparent from a single view. If 
the viewer moved towards the building 18 and tree 20, 
they would grow in perceived size at different rates as 
a function of their respective distances. Certain deci- 
sions could be made therefrom. As the objects ap- 
proached the viewer and were passed by the viewer, a 
different phenomenon would take place. It is that pass- 
ing phenomenon that is of primary importance in ma- 
chine vision and the one which will be addressed with 
particularity herein. 

As shown in FIG. 2, as the viewer passes the building 
18 and tree 20 the background area 16 and foreground 
area 12 remain constant while the building 18 and tree 
20 move with respect to both the background area 16 
and each other. It is this movement relationship which 
allows the scene 10 to be “mapped” into a 3-dimensional 
representation from the data gathered and, additionally, 
for the movement of the objects to be predicted. In 
FIG. 2, we perceive that the building 18 has moved to 
the left a small distance while the tree 20 has moved a 
greater distance. In FIG. 3, we again perceive that the 
building 18 has moved to the left a small distance while 
the tree 20 has moved a greater distance. Note also that 
a rock 22 has appeared within the scene 10. Since it is in 
the foreground area 12, however, it is not relevant to 
the mapping of the objects of interest (i.e. the building 
18 and tree 20) and, therefore, all data relevant to the 
apparent “movement” thereof (which would be pro- 
cessed in a typical prior art system) can be discarded. 
Finally, in FIG. 4, we once again perceive that the 
building 18 has moved to the left a small distance while 
the tree 20 has moved a greater distance. From the data 
provided by the views of FIGS. 1-3, the view could 
have (and subconsciously would have) predicted the 
direction of movement and final position of FIG. 4. In 
the present invention, this is exactly what takes place; 
and, the difference between the predicted movement 
and actual movement is used to updata and correct the 
prediction algorithm employed. 

Machine vision can be used to enable a robotic vehi- 
cle to navigate safely in dangerous environments. The 
machine vision system of the present invention is em- 
ployed for the tasks of sensing, perception, and naviga- 
tion and analyzes the direction of objects moving rela- 
tive to a background scene, by computing: 1) spatial and 
temporal gradients of the smoothed image at several 
spatial scales, orientations, and depth planes (preferably 
using the input from at least two cameras) that are 
summed across the background frame of reference, and 
2) motion parallax computed by determining the direc- 
tion of movement relative to the reference depth plane 
that includes the fixation point, and depth planes in 
front of and behind the reference plane. This is depicted 
in the functional block diagram of FIG. 5. FIG. 5 de- 
picts the present invention as actually tested on a plane- . 
tary rover at the Jet Propulsion Laboratory (JPL). As 
shown therein, three television cameras 24 were 
mounted on the vehicle 26 to provide a forward view 
and views to the two sides. The outputs from the three 
cameras 24 were connected as an input to the machine 
vision system 28. In the tested embodiment, the input to 
the vehicle drive 30 was not actually made as the pur- 
pose was to gather test data for visual analysis and veri- 
fication. In actual use, the control input to the vehicle 
drive 30 from the machine vision system 28 as shown in 
FIG. 5 would be made to allow the machine vision 
system 30 to control the navigation of the vehicle 26. 



5 , 109,425 


7 

As depicted in FIG. 5, the machine vision system 28 
of the present invention has the output from the cam- 
eras 24 connected to an analog to digital converter 32 to 
provide digital output data which can be processed by 
digital computing logic. The digital data from the ana- 
log to digital convert 32 is stored in appropriate image 
buffers 34 (i.e. there is a buffer for each camera 24). The 
data within the buffers 34 is processed by image en- 
hancement logic 36 which filters the image data to en- 
hance the borders so as to improve the accuracy of the 
figure-ground segmentation. In other words, the fore- 
ground (ground) data is to be eliminated as being redun- 
dant to the immediate problem and, therefore, the accu- 
racy of the segmentation of the data is an important 
factor. Again in this regard, it is important to remember 
that a major goal of this invention is the elimination of 
non-relevant computations so that the necessary com- 
putations can be performed in realtime. It should be 
noted as well that the preferred filter as employed 
herein utilizes the “Normalized Transfer Function” 
(NTF). A more detailed analysis and description of the 
use of the NTF in optical systems to improve object 
identification under adverse conditions can be found in 
a co-pending application of the inventor hereof entitled 
LOW VISION IMAGE ENHANCEMENT AID by 
Teri A. Lawton and Donald B. Gennery, Ser. No. 
118,205, filed Nov. 5, 1987. The output from the image 
enhancement logic is input to reduction and magnifica- 
tion logic 38 and computational logic 40. The output 
from the reduction and magnification logic 38 is also 
input to the computational logic 40. The reduction and 
magnification logic 38 reduces or magnifies the size or 
density of the image to give multi-scale resolution at 
several orientations. Thfe computational logic 40 com- 
putes smoothed images and spatial gradients at each 
spatial scale, orientation, position, and depth plane 
within the background frame of reference. 

The output of the computational logic 40 interfaces 
with sensing logic 42 which is in a feedback relationship 
with the computational logic 40; that is, deviations be- 
tween predicted movement and position and actual 
movement and position are fed back to the computa- 
tional process to improve its accuracy in what can be 
loosely classed as a learning process which more nearly 
approximates the human approach to the problem, as 
desired. The output from the sensing logic 42 goes to 
perception logic 44 which constructs and updates a 
depth map of the scene being viewed using data with 
respect to the relationship of perceived “moving” ob- 
jects in the manner of that described with respect to 
FIGS. 1-4. The output from the perception logic 44 (i.e. 
the active depth map) is input to and used by the naviga- 
tional logic 46 to choose the best path to the desired 
goal. This navigational data can be input to the vehicle 
drive 30 to direct the vehicle 26 along a desired course 55 
to a desired goal or destination. In the case of a robotic 
assembly device, for example, the moving object being 
viewed might be a robotic arm which could be guided 
in the same manner by the navigational logic 46 to per- 
form its assembly tasks. 

Increasing the complexity of the scene and the 
amount of noise in the scene (as a result of fog, haze, or 
rain, for example) requires a larger number of spatial 
scales, orientations, depth planes, and different time 
intervals to be analyzed. The smoothed images and the 65 
corresponding spatial and temporal gradients of this 
machine vision system are computed using a distributed 
network with feedforward and feedback connections. 
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This distributed network can be implemented at real- 
time frame rates using either high-speed digital hard- 
ware or analog resistive networks. The gains that 
change the signal-to-noise ratios when computing the 
5 direction of movement, 1) the sensitivity, k/, and con- 
trast threshold, k ://, for the test pattern or object, 2) the 
sensitivity, k/, and contrast threshold, k /&, for the back- 
ground frame of reference, 3) the width of the back- 
ground frame of reference, B, and 4) the slope of the 
10 contrast transducer function n, are changed as a result 
of the feedforward and feedback connections as men- 
tioned above and shown in the drawing figures as bi- 
directional arrows. The constant k o depends on the 
temporal frequency, increasing at high temporal fre- 
15 quencies. There are feedforward and feedback connec- 
tions between: 1) the sensing logic 42 using figure- 
ground segmentation to determine the boundaries of 
objects moving relative to the background scene, 2) the 
determination of whether more pictures of the scene are 
20 needed as a result of low signal-to-noise ratios, and 3) 
the perception logic 44 used to update the depth maps. 
Different types of filtering are used to increase the visi- 
bility of moving object borders via feedforward and 
feedback connections between: 1) the image enhance- 
25 ment logic 36 to improve the accuracy of figure-ground 
segmentation, and 2) the sensing logic 42. For example, 
asymmetric bandpass filtering that boosts the ampli- 
tudes of the intermediate spatial-frequencies more than 
the lower spatial-frequencies, in proportion to an ob- 
30 server’s contrast sensitivity losses, significantly im- 
proved word recognition and reading performance in 
observers having a reduced spatial resolution. The pro- 
posed machine vision system is based on psychophysi- 
cal studies of the ability of human observers to diserimi- 
35 nate direction of movement. The feedforward and feed- 
back connections between visual areas provide a model 
of a distributed cortical neural network for discriminat- 
ing optical flow. Optical flow, an important cue for 
navigation, is determined by measuring the rate of 
40 movement over time, divided by the rate of movement 
across space. Determining the direction an object 
moves relative to the observer is used to disambiguate 
objects from the background frame of reference. The 
background scene provides the frame of reference that 
45 is used when tracking moving objects for pattern recog- 
nition. Following perception of the scene, navigation is 
completed by choosing the path with the least obstacles 
to reach the destination. 

In implementing an actual system according to the 
50 present invention, several factors (that increase the 
visibility of the direction of movement) to be considered 
in the design, parameterization, and implementation 
emerged: 

1) as the spatial-phase difference (between the peak 
luminance of the test pattern relative to the back- 
ground) increased from a minimum phase differ- 
ence of 4° up to a maximum phase difference of 90°. 
Increasing the phase difference from 90° to 180° 
did not change the visibility of shifts in the direc- 

60 tion of movement. Paired Gabor filters which are 
orthogonal filters tuned to a spatial-phase differ- 
ence of 90°, i.e., are in quadrature phase, predict 
these results. 

2) as the temporal duration was increased when mea- 
suring the minimum spatial-phase difference 
needed to discriminate the direction of movement. 
Yet, once the test and background spatial-frequen- 
cies differ in spatial-phase optimally (by 90°), then 
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increasing the temporal frequency by decreasing: 
a) the duration of test and comparison patterns 
from 750 msec down to 50 msec (from 1.3 to 20 
cyc/sec), or b) the duration between test and com- 
parison patterns from 500 msec down to 50 msec 5 
(from 2 to 20 cyc/sec) did not change the contrast 
needed to discriminate the direction of movement, 
or c) increasing the temporal frequency up to 20 
cyc/sec. To determine specificity it is important to 
present patterns that optimally activate the media- 10 
nisms being studied. 

3) as the contrast of a low frequency background was 
increased from 0.5% to 2%. On the other hand, 
increasing the contrast of a single middle spatial- 
frequencies background from 1% to 16% signifi- 15 
cantly reduced the visibility of shifts in the direc- 
tion of movement. However, if the background 
was composed of several middle spatial-frequen- 
cies that repeat over a wide 1 ° area, then increasing 
the effective contrast of the background from 0.6% 20 
to 20% did not reduce the visibility of shifts in the 
direction of movement. The contrast needed to 
discriminate the direction a test pattern moved, 
relative to a clearly visible wide background frame 

of reference are all low, averaging 1% contrast. 25 
Since increasing the contrast of the middle spatial- 
frequencies components of the background did not 
reduce the visibility of apparent movement, as 
found for backgrounds composed of a single spa- 
tial-frequency, then it is the wide spatial extent of 30 
the background (analyzed by global processes in 
the cortex), and not the individual background 
frequencies that provides the frame of reference 
that is used to discriminate the direction of move- 
ment. 35 

4) when the test spatial-frequency was a harmonic of 
the background’s fundamental frequency, that is, it 
repeats within the background frame of reference, 
as opposed to when the test frequency is not a 
higher harmonic of the background’s fundamental 40 
frequency. Global contrast differences that corre- 
spond to the width of the textured background 
provide the frame of reference used to discriminate 
the direction of movement. 

5) when the test pattern was shifted relative to a wide 45 
background, such as 6+7+8 cyc/deg that repeats 
over a 1 ° area, as opposed to being shifted relative 

to a narrow background, such as 6 cyc/deg that 
repeats over a 0.17° area. Contrast thresholds are 
low for twice as wide a range of spatial-frequencies 50 
(3 octaves as opposed to 1-i octaves) when added 
to a wide background composed of middle spatial- 
frequencies, such as 6+7 cyc/deg, as opposed to 
when added to a single middle spatial-frequency 
background, such as 6 cyc/deg. All scales higher in 55 
spatial-frequency than the fundamental frequency 
of the background pattern are used to detect the 
test pattern to discriminate the direction the test 
pattern moves. The range of scales that is used to 
discriminate the direction of movement is deter- 60 
mined by the fundamental frequency of the back- 
ground, that is, by the width of the background 
frame of reference. 

6) as the contrast of middle background spatial-fre- 
quencies (5+7+9 cyc/deg that repeats over a 1° 65 
area) are selectively reduced by placing a blur glass 

in front of the patterns. The global low frequency 
contrast differences within the background pro- 
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vide the frame of reference for discriminating 
movement. The higher component background 
spatial-frequencies are not as important for discrim- 
inating the direction of movement as is the wide 
frame of reference that corresponds to the spatial 
extent of the background. 

Both the global analysis of groups of edges across the 
background frame of reference, and the localized com- 
putation of contrast differences by paired even- and 
odd-symmetric Gabor filters within the frame of refer- 
ence are computed to determine the direction of move- 
ment. Both the spatial-frequency composition and the 
spatial extent of the interval over which the test and 
background patterns repeat changes the visibility of the 
direction of movement. The importance of spatial-fre- 
quency is related to the use of bandpass filtering by the 
channels used to detect different spatial-frequency com- 
ponents. The importance of spatial extent is related to 
the use of wide background frame of reference to im- 
prove the visibility of left-right movement discrimina- 
tion. Only when discriminating movement relative to a 
multi-frequency background, instead of a background 
consisting of a single spatial-frequency, can the relative 
contribution of: 1) global contrast differences corre- 
sponding to the background frame of reference, as op- 
posed to 2) localized contrast differences between indi- 
vidual frequency components be analyzed indepen- 
dently. The results that were found when using patterns 
that optimally activate direction-selective mechanisms 
indicate that the output of paired Gabor filters (in quad- 
rature phase) are summed across the background frame 
of reference to discriminate the direction of movement. 

In the tested implementation of the present invention, 
as shown in FIGS. 5 and 6, the output from paired 
Gabor filters summed across the background frame of 
reference are used to determine the direction of move- 
ment. A pattern’s image intensity is encoded by separa- 
ble spatiotemporal filters, consisting of paired even- and 
odd-symmetric simple cells (bar and edge detectors) in 
quadrature phase, optimally tuned to a 90° spatial-phase 
difference. For each combination of paired even- and 
odd-symmetric filters at each spatial scale being ana- 
lyzed, the input signal passes through the spatial and 
temporal filters to produce the four separable responses: 
each paired spatial filter being processed during two 
different time intervals by both 1) a low-pass temporal 
filter, and 2) a bandpass temporal filter optimally tuned 
to 10 cyc/sec. Sums and differences of paired filters in 
quadrature phase are taken to produce the spatiotem- 
porally-oriented nonlinear responses that are selective 
for the direction of motion. The mathematical expres- 
sions that describe an even-symmetric Gabor function, 
F £5, and an odd-symmetric Gabor function, Fqs, are: 


FESiffX.tT 2 )— COS (2vfx)e ~( x ~ *o)2/2 cr2 


o' 2 ) = sin (27rfx)e ~( x ~ xo)V2o-2 


where f corresponds to the spatial-frequency of the 
pattern, x corresponds to the horizontal spatial position 
being examined, \ 0 corresponds to the observer’s fixa- 
tion point, and <r 2 corresponds to the variability within 
the pattern’s spatial period in locating the position of the 
most salient contrast difference, x^. 

Gabor filters were chosen because the receptive 
fields of even-symmetric and odd-symmetric simple 
cells in the visual cortex that are used to discriminate 
the direction of movement can be characterized by 
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Gabor filters. The Gabor filter is either a sine or a cosine 
multiplied times a Gaussian function. Gabor functions 
optimize resolution on a linear scale in both the spatial- 
position and spatial -frequency domains. Gabor func- 
tions optimize processing across space and over time. 5 
The Gaussian function acts as a spatially-localized 
smoothing function, significantly reducing the sensitiv- 
ity of the cell as the pattern moves away from the center 
of the cell’s receptive field. As the variance of the 
Gaussian filter is reduced: 1) the width of the sidebands 10 
of the even-symmetric filter is reduced relative to the 
center, and 2) the spatial-frequency bandwidth of the 
filter is reduced. Increasing the variance of the Gaussian 
filter increases the number of sidelobes that are encoded 
by the filter. The sine and cosine function of the Gabor 15 
filter acts like a bandpass spatial-frequency function that 
extracts the contrasts at each spatial position being ex- 
amined. An even-symmetric filter computes the 
smoothed contrast at each spatial position being exam- 
ined, whereas an odd-symmetric filter computes the 20 
corresponding contrast difference or spatial gradient. 
Therefore, by examining the output from paired even- 
and odd-symmetric Gabor filters, both the smoothed 
contrast at each spatial position being examined and the 
corresponding spatial gradients are computed. The spa- 25 
tially-localized paired sine and cosine functions enable 
the position of contrast differences to be measured using 
the smallest number of channels. Paired Gabor filters 
provide an encoding scheme for the visual cortex which 
maximizes signal-to-noise ratios given a fixed number of 30 
neurons. The need for paired even- and odd-symmetric 
filters to predict the direction of movement is also indi- 
cated by psychophysical data that found increasing the 
position difference between the peak luminance of test 
and background patterns from a minimum phase differ- 35 
ence, up to 90°, increased the visibility of the shifts in 
the direction of movement, and any larger phase differ- 
ence did not affect movement discrimination. 

For test and background patterns having peak lumi- 
nances that differ in position by 90°, an even-symmetric 40 
filter would be used to detect the background, and an 
odd-symmetric filter would be used to detect the direc- 
tion the test grating moved relative to the background. 

As mentioned previously, by examining the output from 
paired even- and odd-symmetric Gabor filters, both the 45 
smoothed contrast at each spatial position being exam- 
ined and the corresponding spatial gradients are com- 
puted. Many prior art computational models that pre- 
dict movement implement a smoothing operation over 
the moving object and then compute the spatial and 50 
temporal gradients of the smoothed object. These mod- 
els, like those proposed to account for movement dis- 
crimination for simple patterns, do not take into account 
the importance of: 1) paired even- and odd-symmetric 
filtering functions, 2) global contrast differences that 55 
correspond to the background frame of reference, 3) a 
nonlinear contrast response, and 4 ) the spatial and tem- 
poral thresholds that are inherent when computing 
spatial and temporal gradients. Therefore, these prior 
art computational models cannot and do not incorpo- 60 
rate all of the psychophysical results that must be ac- 
counted for by a robust model that predicts direction of 
movement. 

Previous models that propose that paired Gabor fil- 
ters be used to predict the direction of movement are 65 
extended and made practical and useful for actual use in 
a real-time control environment in the present invention 
by including: 1) threshold mechanisms dependent on a 
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pattern’s spatial and temporal frequencies, 2) a nonlin- 
ear contrast transducer function, and 3) summing the 
outputs of the difference and sum of paired even- and 
odd-symmetric filters over the background’s spatial 
extent, which provides the frame of reference used to 
judge the direction of movement at all spatial scales that 
are higher in spatial-frequency than the background’s 
fundamental frequency. 

The direction a pattern moves relative to its back- 
ground at two different times is discriminated whenever 
the output from mechanisms that compute the differ- 
ence and sum of paired even- and odd-symmetric band- 
pass channels pooled across the background frame of 
reference exceeds threshold. The model shown with 
particularity in the computational logic block 40 of 
FIG. 6 is employed to predict the visibility of shifts in 
the direction a pattern (detected by temporal bandpass 
F os) moves relative to a multi-frequency background 
(detected by temporal lowpass F es)- In particular, there 

is the Fes term, 


x=x 0 +B/2 kj c£ + k n 
x=x 0 -B/l C b " + C 0 E: 


k£f_ + k fi 
C," + C 0 


FOS 


at time t 2 plus and minus the Fos term, 


x-Xq+B/ 2 kjCb n + kjt kjC£ + k f , 

x=x„-B/l C b » + C 0 ES C," + C 0 OS 

at time ti, where C t corresponds to the contrast of the 
test pattern, k/ f corresponds to the contrast threshold for 
the test frequency, C* corresponds to the contrast of the 
background pattern, k fb corresponds to the contrast 
threshold for the background frequencies, Q>, that de- 
pends on the temporal frequency, is a constant that 
corresponds to the signal-to-noise ratio used when de- 
tecting left-right movement of a test pattern relative to 
the background, n corresponds to the slope of the con- 
trast response function (usually n is approximately 2 
since the contrast response functions of simple and com- 
plex cells in the striate cortex are rectified), m usually is 
1 but sometimes equals 2 as a result of rectification 
which only occurs at high temporal frequencies, B cor- 
responds to the spatial period of the background frame 
of reference, X 0 corresponds to the zero-crossing or 
contrast difference that the observer uses as the point of 
reference to judge the direction the test pattern moved 
between the two pattern presentations ti and t 2 , and k/, 
ky are constants for the gain of the contrast sensitivity 
that is changed by feedback. The difference and sum of 
paired even- and odd-symmetric filters are summed 
across the background frame of reference, and then 
differenced at times t 2 and t\ to discriminate the direc- 
tion of movement. Suppose the observer is practiced 
and has stored a template of the phase-shifted test pat- 
tern. Then, only one temporal interval is needed to 
identify the direction of movement by using the texture 
pattern of contrast differences. However, the output 
from both temporal intervals must be compared to iden- 
tify the direction of movement using the lowest con- 
trasts possible. 

As shown in the logic flowchart of FIG. 6, the logic 
flow from the computational logic 40 is to query block 
48 where the logic determines if the analysis performed 
by the computational logic 40 has been performed at all 
spatial scales, orientations, positions, and depth planes 
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provided for. If not, the logic flow returns to the reduc- 
tion and magnification logic 38 and the computational 
logic 40 to perform such additional image orientations 
and computations. When the task is complete, the tem- 
poral gradients of the data are computed at block 50 and 5 
the logic flow moves to the sensing logic 42 . Note that 
in the feedforward/feedback approach of the present 
invention, if additional data of a scene is required for the 
sensing logic to perform its function, the logic may 
transfer control back to query block 48 to determine if 10 
all possible data is present. When the sensing logic 42 
finishes its task with respect to the data of a particular 
view, it transfers control to query block 52 where the 
logic determines if all the views required to construct 
and/or update a useful depth map have been analyzed. 15 
In other words, a single view does not provide suffi- 
cient data on the field of view upon which a 3-dimen- 
sional interpretation can be made. The analysis process 
is accomplished for the number of views required to 
provide sufficient data on the relative movement of the 20 
objects (as in the simplified drawings of FIGS. 1 - 4 ) 
such that a meaningful depth map can be constructed 
(or updated) from that data. Until that point is reached, 
the logic flow returns to the image enhancement logic 
36 to process the next view from the cameras 24 as input 25 
to the digial input buffers 34 . When enough data is 
present, query block 52 transfers control to perception 
logic 46 . 

To compute the direction a test pattern moves rela- 
tive to a multi-frequency background, the difference 30 
and sum of the output of paired Gabor filters computed 
during each of the temporal intervals being examined 
must be computed and compared. Leftward movement 
is signaled by the difference between the outputs of 
paired even- and odd-symmetric functions, whereas 35 
rightward movement is signaled by the sum of the out- 
puts of paired even- and odd-symmetric filters. Left- 
right movement at one spatial position is discriminated 
when the difference or sum of paired even- and odd- 
symmetric filters at time ti differs significantly from the 40 
difference or sum at time t 2 . Squaring the output of 
paired filters introduces a component at the difference 
frequency, which plays an important role in discrimi- 
nating the direction of movement. The sign of the tem- 
poral gradient will determine whether the test pattern 45 
moved to the right or moved to the left of the back- 
ground’s peak luminance. 

The contrast of each pattern component is detected 
using filters tuned to spatial-frequencies having a band- 
width of Ito 1£ octaves. A hyperbolic contrast response 50 
function, which optimally characterizes the contrast 
transducer function of normal observers and simple and 
complex cells in the striate cortex, is used to predict the 
minimum contrast needed to discriminate the direction 
of movement. The hyperbolic contrast transducer func- 55 
tion is a nonlinear function that contains both a nonlin- 
ear power function for the contrast of the test pattern, 

C n , and thresholding factors, C& K/,, and K /&. 

The model employed in a system according to the 
present invention to discriminate direction of move- 60 
ment predicts psychophysical results; that is, the model 
predicts that the output of odd-symmetric filters rela- 
tive to paired even-symmetric filters will increase as the 
spatial-phase difference increases, up to 90°. There is a 
trade-off in: 1) the phase difference between test and 65 
background patterns, and 2) the contrast of the test 
frequency that is needed to discriminate the direction of 
movement. As the spatial-phase difference is increased 
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up to 90°, smaller contrasts are needed to discriminate 
the direction of movement. There is both a minimum 
spatial position and a minimum temporal duration that is 
needed to discriminate the direction a pattern moves 
relative to the background. Movement discrimination 
requires that the relative activation of both the test and 
background patterns, as measured by the output of 
paired Gabor filters (optimally tuned to 90° spatial- 
phase differences), be above threshold. 

When the contrast of the background frame of refer- 
ence is above threshold and below the saturation level 
of its contrast response function, for example from 0.6% 
to 20%, then the visibility of the direction of movement 
is not changed by the contrast of the background. The 
model proposed above predicts this result since at each 
temporal interval, the output of odd-symmetric filters, 
which compute the spatial gradient at each position 
being examined, is both summed and differenced from 
paired even-symmetric filters that are activated by the 
background frame of reference. As long as the differ- 
ence between the sum and difference of the output of 
paired Gabor filters is greater than a threshold amount, 
then the direction of movement is discriminated. The 
contrast threshold for the test frequency is independent 
of the contrast of a suprathreshold background since 
(C*+Q)+(Q,-C,)=2Q. 

The background pattern composed of spatial-fre- 
quencies combined in cosine phase acts as the frame of 
reference to judge the direction a pattern moves relative 
to a multi-frequency background. Since the low funda- 
mental frequency of the background is more important 
than the higher component spatial-frequencies, this 
indicates that the multi-frequency background that re- 
peats over a wide 1° area operates like a low-pass filter. 
The increased visibility of the direction of movement 
when judged relative to a wide as opposed to a narrow 
background frame of referencehas been found psycho- 
physically. In addition, the direction of movement is 
visible at lower contrasts when the test spatial-fre- 
quency is a harmonic of the background spatial-fre- 
quencies; that is, when it repeats within the background 
frame of reference, compared to when the test and 
background frequencies are not harmonically-related. 
The low-pass filtering which determines the back- 
ground frame of reference is represented in the model as 
the sum across the spatial period of the background, B, 
of the output of the difference and sum of paired Gabor 
filters, at all spatial scales that are an integral multiple of 
the background’s fundamental frequency. The most 
salient contrast differences, x^, are identified and used as 
a point of reference within the background frame of 
reference to judge the direction a pattern moves relative 
to a multi-frequency background. 

It has been found through testing that discriminating 
the direction of movement is a task determined in the 
human cortex by paired even-symmetric and odd-sym- 
metric simple cells consisting of excitatory and inhibi- 
tory subfields. In the present invention, a simulated 
neural network is used to discriminate the direction of 
movement in like manner. Finding that the receptive 
fields of even- and odd-symmetric simple cells vary in 
size allows for the coding of an image at different spatial 
scales. There are cells tuned to all different spatial-phase 
angles; however, paired even- and odd-simmetric sim- 
ple cells in visual area VI of the cortex are optimally 
tuned to a spatial-phase difference of 90°. These psycho- 
physical results are consistent with visual psychophys- 
ics studying the direction a test pattern moved relative 
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to simple and multi-frequency backgrounds. Both 1) the 
center phase did not change the visibility of the direc- 
tion of movement, only the phase difference changed an 
observer’s sensitivity for discriminating the direction of 
movement, and 2) the visibility of shifts in the direction 
of movement was optimally tuned to a phase difference 
of 90°. 

The neural network used to discriminate the direction 
of movement that is suggested by psychophysical and 
neurophysiological data is that the output of paired 
even- and odd-symmetric simple and complex cells in 
the visual cortex activated by both local and global 
contrast differences are used to discriminate the direc- 
tion of shifts in the positions of contrast differences over 
time. Recent physiological data found a nonlinear out- 
put response from X-like cells in the Lateral Geniculate 
Nucleus (LGN) and complex cells in the striate cortex 
that corresponds to the difference frequency of a com- 
pound grating. A difference in spatial-frequency be- 
tween the test and background is needed to discriminate 
the direction of movement, as found in the inventor’s 
laboratory and others. The nonlinear component that 
corresponds to the amplitude of the output at the differ- 
ence frequency can be predicted as a result of rectifica- 
tion (a squaring of the output response). The output of 
cortical cells are nonlinear, primarily due to: 1) thresh- 
olds that must exceed the spontaneous cortical activity, 
2) rectification of the output response, 3) changes in the 
gain, such as those found following contrast adaptation, 
and 4) saturation of the contrast response working 
range. All of these nonlinearities are incorporated in the 
model utilized in the tested embodiment of the present 
invention. The outputs of paired even- and odd-sym- 
metric simple cells at different spatial scales are squared 
as a result of rectification. The sum and difference of the 
outputs of paired cells, implemented by pairs of simple 
cells having opposite polarity, are summed across the 
background frame of reference. Low frequency con- 
trast differences corresponding to the background 
frame of reference provide the fundamental scale that is 
used to compute the direction of movement. The con- 
trast differences induced by the position difference be- 
tween test and background gratings are computed at all 
spatial scales tuned to spatial-frequencies equal to and 
higher than the background’s fundamental frequency. 

To test the basis of the present invention, a simulated 
neural network that computes the visibility of shifts in 
the direction of movement was constructed and tested. 
That network computed: 1) the magnitude of the posi- 
tion difference between the test and background pat- 
terns, 2) localized contrast differences at different spa- 
tial scales analyzed by computing temporal gradients of 
the difference and sum of the output of paired even- and 
odd-symmetric bandpass filters convolved with the 
input pattern, and 3) using global processes that pool the 
output from paired even- and odd -symmetric simple 
and complex cells across the spatial extent of the back- 
ground frame of reference. Evidence was gained that 
magnocellular pathways are used to discriminate the 
direction of movement using patterns that optimize an 
observer’s ability to discriminate the direction a pattern 
moved relative to a textured background. Since mag- 
nocellular pathways are used to discriminate the direc- 
tion of movement, the task is not affected by small pat- 
tern changes such as jitter, short presentations, blurring, 
and different background contrasts that result when the 
veiling illumination in a scene changes. 
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GENERATION OF DEPTH MAPS 

The generation of a depth map is an important aspect 
of the method of the present invention when perform- 
5 ing under actual operating conditions (as described 
briefly with respect to the perception logic 42 of FIGS. 
5 and 6). There are several approaches for generating 
depth maps: 1) stereo correlation that uses the disparity 
between the same image in the scene recorded from two 
10 different perspectives at the same time, 2) active rang- 
ing that emits wave patterns, either laser, microwave, or 
sound, and determines the time for the signal to return 
after hitting each object of interest in the scene, and 3) 
motion parallax that measures the velocity gradients 
15 between different points in the scene (as described with 
respect to FIGS. 1-4). Both stereo correlation and ac- 
tive ranging require static images for matching to con- 
struct the depth map of a scene, whereas, motion paral- 
lax requires dynamic images. Thus, motion parallax is 
20 the only technique that is useful for constructing and 
updating the depth map of a scene while the vehicle is 
in motion. When the vehicle is not moving and static 
images must be used to construct the depth map, stereo 
correlation is preferred to active ranging because: 1) it is 
25 not emissive, and 2) it has much simpler hardware re- 
quirements, and 3) has much better near field resolution. 
With stereo matching, the resolution decreases in pro- 
portion to the square of the distance. Using motion 
parallax, distant objects can be ranged at high resolution 
30 by recording images along distances orthogonal to the 
line of sight being used to generate the depth map. Mov- 
ing the vehicle over a longer distance is equivalent to 
increasing the baseline between the stereo cameras, 
increasing the amount of disparity between pictures at 
35 far distances. Thus, both close and distant objects can be 
ranged at high resolution when motion parallax is used 
to construct the depth map. In addition, motion parallax 
has much simpler hardware requirements than either 
stereo correlation and active ranging methods, since 
40 motion parallax requires only one camera to analyze the 
scene. 

The system of this invention in its tested embodiment 
determines the 2-dimensional position and size of ob- 
jects and uses motion-parallax to convert them into a 
45 3-dimensional depth map in the perception logic 46. A 
sequence of video images taken at small distance inter- 
vals by horizontally scanning a natural scene using cam- 
eras mounted on the current Mars Rover prototype (in 
the manner of FIG. 5) was used to measure motion 
50 parallax or translational movement of the scene. The 
depth map was constructed from this sequence of video 
images by algorithms implemented on a VAX com- 
puter. The creation of the depth map and the continu- 
ous estimation of the vehicle motion was accomplished 
55 by scene analysis using paired even- and odd-symmetric 
filters that are normalized Gabor filters (a cosine times 
a Gaussian and a sine times a Gaussian), as described 
earlier herein, the even- and odd-symmetric filters are 
oriented filters that compute both the smoothed con- 
60 trast and the corresponding spatial gradient at each 
spatial position. Thus, these paired filters provided two 
independent measures of the contrast differences in the 
scene. These paired filters were computed at several 
orientations and resolutions (spatial scales) using a 
65 coarse-to-fine matching procedure. The filters are ori- 
ented orthogonal to the direction of vehicle movement. 
In performing this operation, the scene is scanned until 
the first suprathreshold contrast difference (which de- 
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notes an object boundary) is found before determining 
the object map that is used to compute motion parallax. 
The amount an object's position is shifted between each 
scene is used to determine the motion parallax that is 
then used to partition the scene into objects at different 5 
depths. Thresholding of the outputs of the paired filters 
analyzing the same object in different scenes is used for 
the matching. Thus, intelligent matching instead of 
exhaustive matching is used. This is important for real- 
time applications as it reduces the amount of computing 1° 
(and, therefore, time that is consumed). The depth map 
is constructed using oriented paired filters analyzed 
over different windows of attention, determined by 
different object widths, across the scene. Thus, an area- 
based approach is used to partition the scene into ob- 15 
jects at different depths using oriented filters. The 
paired even-and odd-symmetric filters include a nonlin- 
ear contrast transducer function that incorporates gain 
and threshold changes induced by different scene pa- 
rameters, such as illumination, to normalize the output 2 
of the paired filters. For each object, the location that 
produces the maximum output of paired even- and odd- 
symmetric filters at low resolutions (implemented using 
lowpass filtering) is used to determine the width of the ^ 
object. Higher spatial scales are used for verification. A 
depth map is then constructed using regions of constant 
velocity to construct each object’s height and width, 
and changes in the direction of movement relative to 
the vehicle and other objects to infer the position of 3Q 
these objects in a three-dimensional scene. The closer 
the object is to the camera, the larger the distance trans- 
versed by the object in each video image of the scene, as 
described with respect to the simplified drawings of 
FIGS. 1 - 4 . The depth map is constructed by analyzing 35 
the scene that is closest to the vehicle’s cameras before 
analyzing higher elevations. This is because closer ob- 
jects move farther than and can occlude more distant 
objects. The depth map is updated by minimizing devia- 
tions from expected changes in object boundaries at 40 
different depths over time. This approach takes into 
account several important parameters in the matching 
procedure such as the occlusion of far objects by closer 
objects, the clustering of moving images at the same 
depth, and the relative movement of objects at different 45 
depths as one views the scene from different perspec- 
tives, that should be used by a robotic vision system that 
constructs three-dimensional depth maps from two-di- 
mensional luminance differences. 

Stereo correlation and active ranging techniques for 50 
constructing a depth map for robotic vehicles and 
telerobots rely on static images. Constructing depth 
maps for robotic vision using motion parallax that relies 
on dynamic images provides the following advantages: 
able to continually update the depth map and main- 55 
tain knowledge of the vehicle’s motion as it moves 
through the environment on a real-time basis, 
requires only one camera to construct the depth map, 
if necessary. 

paired filters are computationally efficient across 60 
space and over time; they optimize resolution and 
signal-to-noise ratios across space and over time in 
both the frequency and spatial-temporal domain 
(AxAf>i and AtAf>$). 

uses intelligent matching based on thresholding, in- 65 
stead of exhaustive matching, 
uses both area-based and feature-based matching; 
(area-based approaches have difficulty with man- 
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made objects, feature-based approaches have diffi- 
culty with natural terrains). 

filters are based on biological filters used for (natural) 
movement discrimination by human observers. 

both close and distant objects can be ranged at high 
resolution. 

the filters analyzing the scene are robust for different 
scene parameters (for example different scene il- 
luminations). 

provides economical and robust means to continually 
update depth map during vehicle motion. 

designed using a structured hierarchical, parallel, and 
interactive (feedforward and feedback) approach 
to verify and update the depth map of the scene. 

I claim: 

1 . A machine vision system for determining move- 
ment in a 2-dimensional field of view comprising: 

a) video camera means for viewing the field of view 
and producing 2-dimensional binary representa- 
tions of the pattern thereof at consecutive times ti 
and t2; 

b) image enhancement means for receiving said 2- 
dimensional binary representations of the field of 
view and for producing enhanced 2-dimensional 
binary representations thereof; 

c) computational means for producing smoothed 
versions of said enhanced 2-dimensional binary 
representations, said computational means includ- 
ing means for filtering binary data comprising said 
enhanced 2-dimensional binary representations to 
provided the spatial gradients of components 
thereof comprising a background frame of refer- 
ence and components identified with objects mov- 
ing against said background frame of reference; 

d) means for producing the temporal gradients of said 
enhanced 2-dimensional binary representations; 
and, 

e) sensing means for comparing said spatial and tem- 
poral gradients at said consecutive times to one 
another to determine any motion parallax existing 
in the. field of view, whereby movement of objects 
in the field of view is determined. 

2 . The machine vision system of claim 1 wherein: 

the outputs from paired Gabor filters summed across 

said background frame of reference are used by 
said sensing means to determine the direction of 
movement. 

3 . The machine vision system of claim 1 and addition- 
ally including: 

a) means for encoding the image intensity of said 
pattern of the field of view by separable spatial and 
temporal filters consisting of paired even- and odd- 
symmetric simple cells in quadrature phase; 

b) for each combination of said paired even- and. 
odd-symmetric filters at each spatial scale being 
analyzed, means for passing the input signal 
through said spatial and temporal filters to produce 
four separable responses wherein the output from 
each said spatial filter is processed during two dif- 
ferent time intervals by a low-pass temporal filter 
and a bandpass temporal filter; and, 

c) means for taking the sums and differences of the 
outputs from said paired filters to produce spati- 
otemporally-oriented nonlinear responses that are 
selective for the direction of any motion. 

4 . The machine vision system of claim 3 wherein: 
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said paired even- and odd-symmetric simple cells in 
quadrature phase are tuned to a 90° spatial-phase 
difference. 

5. The machine vision system of claim 3 wherein said 
separable spatial and temporal filters comprise: 

a) an even-symmetric Gabor function, F^s, described 
as, 

^ES{f’X,cr 2 )=c os (27rfx)e~ ix ~ Xo)1/2(r2 ; 

and, 

b) an odd-symmetric Gabor function, F os, described 
as, 


Fodf'X. 0 *)= sin {2TTfx)e~(x-xoW<r 2 

, where f corresponds to the spatial-frequency of 
said pattern, x corresponds to the horizontal spatial 
position being examined, x 0 corresponds to said 
video camera mean’s fixation point, and cr 2 corre- 
sponds to the variability within said pattern’s spa- 
tial period in locating the position of the most sa- 
lient contrast difference. x 0 . 

6. The machine vision system of claim 3 and addition- 
ally including: 

means for discriminating the direction an object in 
said pattern moves relative to said background 
frame of reference at two different times whenever 
an output from said means for taking the sums and 
differences of outputs from said paired filters 
pooled across said background frame of reference 
exceeds a pre-established threshold. 

7. The machine vision system of claim 3 wherein: 

a) said low-pass temporal filter F^sis described by, 


x=x 0 +B/2 kjCb n + kfl, ^ , kjC," + kfi 

x=Xo-B/2 Cb” + C 0 ES ~ C," + C 0 


F*q S at time 


b) and said bandpass temporal filter Fos is described 
by, 


x = Xq+BP- + kjb 

x—x 0 —B/2 C b " + C 0 ES 


kjC," + kfi 
c," + Co 


Fqs at t ' me *1 


, where C f corresponds to the contrast of a test 
pattern, kfi corresponds to the contrast threshold 
for a test frequency, Cb corresponds to the contrast 
of the pattern of said background frame of refer- 
ence, kfi> corresponds to the contrast threshold for 
the frequencies of said background frame of refer- 
ence, C 0 , that depends on the temporal frequency, 
is a constant that corresponds to the signal-to-noise 
ratio used when detecting left-right movement of a 
test pattern relative to of said background frame of 
reference, n corresponds to the slope of the con- 
trast response function, m is a constant, B corre- 
sponds to the spatial period of said background 
frame of reference, Xq corresponds to a zero-cross- 
ing or contrast difference used as the point of refer- 
ence to judge the direction said test pattern moved 
between two pattern presentations at times ti and 
t2, and k/ t k/are changeable constants for the gain of 
the contrast sensitivity which may be changed by 
feedback. 

8 . The machine vision system of claim 1 wherein said 
video camera means comprises: 

a) a video camera having an input lens for viewing an 
area of interest and producing an analog signal at 


20 

an output thereof reflecting a 2-dimensional area of 
interest being viewed by said input lens; 

b) analog to digital conversion means connected to 
said output of said video camera for converting 

5 said analog signal into a binary signal; and, 

c) digital buffer means for storing said binary signal. 

9. In a machine vision system including a video cam- 
era having an input lens for viewing an area of interest 
and producing an analog signal at an output thereof 

10 reflecting a 2-dimensional area of interest being viewed 
by the input lens, an analog to digital converter con- 
nected to the output of the video camera for converting 
the analog signal into a binary signal, a digital buffer for 
storing the binary signal, and logic for analyzing the 
binary signal at times tl and t2 to provide information 
about the contents of the area of interest, the method for 
determining movement in the area of interest compris- 
ing the steps of: 

20 a) filtering the binary signal from the digital buffer at 
times ti and t2 to produce enhanced 2-dimensional 
binary representations of the area of interest at the 
times; 

b) producing smoothed versions of the enhanced 

25 2-dimensional binary representations; 

c) processing the enhanced 2-dimensional binary rep- 
resentations to provided the spatial gradients of 
components thereof comprising a background 
frame of reference and components identified with 

30 objects moving against the background frame of 
reference; 

d) producing the temporal gradients of the enhanced 
2-dimensional binary representations; and, 

e) comparing the spatial and temporal gradients at 

35 times ti and t2 to one another to determine any 

motion parallax existing in the area of interest, 
whereby movement of objects in the area of inter- 
est is determined. 

10. The method of claim 9 and additionally including 

40 the steps of: 

a) summing the outputs from paired Gabor filters 
across the background frame of reference; and, 

b) using the summed output to determine the direc- 
tion of movement. 

45 11. The method of claim 9 and additionally including 

the steps of: 

a) encoding the image intensity of the pattern of the 
area of interest by separable spatial and temporal 
filters consisting of paired even- and odd-symmet- 
ric simple cells in quadrature phase; 

b) for each combination of the paired even- and odd- 
symmetric Filters at each spatial scale being ana- 
lyzed, passing the input signal through the spatial 

55 and temporal filters to produce four separable re- 
sponses wherein the output from each the spatial 
filter is processed during two different time inter- 
vals by a low-pass temporal filter and a bandpass 
temporal filter; and, 

60 c) taking the sums and differences of the outputs from 
the paired filters to produce spatiotemporally-ori- 
ented nonlinear responses that are selective for the 
direction of any motion. 

12. The method of claim 11 and additionally includ- 
es ing the step of: 

tuning the paired even- and odd-symmetric simple 
cells in quadrature phase to a 90° spatial-phase 
difference. 
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13. The method of claim 11 wherein the separable 
spatial and temporal filters comprise: 

a) an even-symmetric Gabor function, F£5, described 

as, 

5 

F£s(£xcr 2 )=o os {l 7 rfx)e xo)2/2cr2. 

and, 

b) an odd-symmetric Gabor function, Fos, described 

as, 10 

Fos(f'X'Cr 2 )= sin (27rfx)e~ (x ~ xc ^ 2/2cr2 

, where f corresponds to the spatial-frequency of 
the pattern, x corresponds to the horizontal spatial 
position being examined, x 0 corresponds to the 
video camera mean’s fixation point, and cr 2 corre- 
sponds to the variability within the pattern’s spatial 
period in locating the position of the most salient 
contrast difference, x 0 . 20 

14. The method of claim 11 and additionally includ- 
ing the step of: 

discriminating the direction an object in the pattern 
moves relative to the background frame of refer- 
ence at two different times whenever the sums and 25 
differences of outputs from the paired filters pooled 
across the background frame of reference exceeds a 
pre-established threshold. 

15. The method of claim 11 wherein: 

a) the low-pass temporal filter F^s is described by, 30 


x=x 0 +B/2 kjCb n + kjb 
x=x 0 -B/2 C fc ” + C 0 E: 


kjC," + k f , 
cr + Co 


F r Q S at time tj 


b) and the bandpass temporal filter Fos is described 
by, 


x=x 0 ^B/2 kj Qf + k/b 
x=x 0 -B/2 C b » + C 0 ES 


k,C," + k f , 
C,” + C 0 


F*qs at l ^ me 0 


40 


, where C, corresponds to the contrast of a test 
pattern, k/ t corresponds to the contrast threshold 
for a test frequency, C* corresponds to the contrast 
of the pattern of the background frame of refer- 45 
ence, k/b corresponds to the contrast threshold for 
the frequencies of the background frame of refer- 
ence, C 0 , that depends on the temporal frequency, 
is a constant that corresponds to the signal-to-noise 
ratio used when detecting left-right movement of a 50 
test pattern relative to of the background frame of 
reference, n corresponds to the slope of the con- 
trast response function, m is a constant, B corre- 
sponds to the spatial period of the background 
frame of reference, x 0 corresponds to a zero-cross- 55 
ing or contrast difference used as the point of refer- 
ence to judge the direction the test pattern moved 
between two pattern presentations at times ti and 
t2, and k /, k/are changeable constants for the gain of 
the contrast sensitivity which may be changed by 60 
feedback. 

16. The method of claim 15 and first additionally 
including determining the direction a test pattern moves 
relative to a multi -frequency background by the steps 
of: 65 

a) computing and comparing the difference and sum 
of the output of paired Gabor filters during each of 
a pair of temporal intervals being examined; 
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b) establishing leftward movement by the difference 
between the outputs; and, 

c) establishing rightward movement by the sum of the 
outputs. 

17. The method of claim 16 and first additionally 
including the steps of: 

a) discriminating left-right movement at one spatial 
position when the difference or sum of the paired 
even- and odd-symmetric filters at time ti differs 
significantly from the difference or sum at time t2; 
and, 

b) squaring the output of the paired filters to intro- 
duce a component at the difference frequency 
whereby the sign of the temporal gradient deter- 
mines whether the test pattern moved to the right 
or moved to the left of the background’s peak lumi- 
nance. 

18. The method of claim 17 and additionally includ- 
ing the steps of: 

a) detecting the contrast of each pattern component 
using filters tuned to spatial -frequencies having a 
bandwidth of 1 to li octaves; and, 

b) employing a hyperbolic contrast response function 
such as that which optimally characterizes the 
contrast transducer function of normal observers 
and simple and complex cells in the striate cortex to 
predict the minimum contrast needed to discrimi- 
nate the direction of movement. 

19. The method of claim 18 wherein: 

the hyperbolic contrast transducer function is a non- 
linear function that contains both a nonlinear 
power function for the contrast of the test pattern 
and thresholding factors. 

20. In a machine vision system including a video 
camera having an input lens for viewing an area of 
interest and producing an analog signal at an output 
thereof reflecting a 2-dimensional area of interest being 
viewed by the input lens, an analog to digital converter 
connected to the output of the video camera for con- 
verting the analog signal into a binary signal, a digital 
buffer for storing the binary signal, and logic for analyz- 
ing the binary signal at times ti and t2 to provide infor- 
mation about the contents of the area of interest, the 
method for determining movement in the area of inter- 
est by establishing the 2-dimensional velocity of objects 
and converting them into a 3-dimensional position and 
depth map comprising the steps of: 

a) horizontally scanning the area of interest to create 
a sequence of video images taken at small distance 
intervals so as to measure motion parallax or trans- 
lational movement taking place in the area of inter- 
est; and, 

b) constructing the depth map by scene analysis using 
paired even- and odd-symmetric filters that are 
normalized Gabor filters, that comprise a cosine 
times a Gaussian and a sine times a Gaussian, the 
even- and odd-symmetric filters being oriented 
filters that compute both the smoothed contrast 
and the corresponding spatial gradient at each spa- 
tial position to provide two independent measures 
of the contrast differences in the area of interest, 
the paired filters being computed at several orienta- 
tions and resolutions using a coarse-to-fine match- 
ing procedure; 

wherein said step of scanning the area of interest in- 
cludes the step of: 

dividing the area of interest into rectangular windows 
of attention determined by different object widths 
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wherein closer occluding objects determine a maxi- 
mum window of attention that can be used to view 
more distant object; 
wherein; 

a) within each window of attention, the location that 5 
produces the maximum output of the paired even- 
and odd-symmetric filters at low resolutions that is 
above a pre-established threshold is used to deter- 
mine the width of objects that are used to partition 
the area of interest into different depth regions with 10 
those objects being verified at higher resolution 
spatial scales; 

b) constructing the depth map using regions of con- 
stant velocity to construct each moving object’s J5 
height and width and changes in the direction of 
movement relative to the video camera and other 
objects in the scene to infer the position of the 
objects in a three-dimensional scene such that the 
closer an object is to the video camera, the larger 20 
the distance transversed by the object in each video 
image of the scene; and, 

c) updating the depth map by minimizing deviations 

from expected changes in object boundaries at 
different depths over time. 25 

21. The method of claim 20 and additionally includ- 
ing the steps of: 

a) scanning the area of interest until a first suprathre- 
shold contrast difference is found; 

b) determining the motion parallax of objects moving 30 
within the area of interest; and, 

c) constructing the depth map using oriented paired 
filters analyzed over different windows of attention 
across the area of interest. 

22. The method of claim 21 wherein; 35 

the paired even- and odd-symmetric filters employed 

in step (c) include a nonlinear contrast transducer 
function that incorporates gain and threshold 
changes induced by different scene parameters, 
such as illumination, to normalize the output of the 40 
paired filters. 

23. The method of claim 21 and additionally includ- 
ing the step of: 

tuning the paired even- and odd-symmetric filters ^ 
employed in step (c) in quadrature phase to a 90° 
spatial-phase difference. 

24. The method of claim 21 wherein the paired even- 

and odd-symmetric filters employed in step (c) are sepa- 
rable spatial and temporal filters comprising: 50 

a) an even-symmetric Gabor function, F£$, described 
as, 

FES(f' x ,cr 2 )=cos (2irfx)e-t*- x °) 2/2(r2 ; 

- 55 

and, 

b) an odd-symmetric Gabor function, F os, described 
as, 

Fosif> x ^°’ 2 )~ sm *<>)2/2o-2 ^ 

,where f corresponds to the spatial-frequency of 
the pattern of the area of interest, x corresponds to 
the horizontal spatial position being examined, x^ 
corresponds to the video camera’s fixation point, 65 
and a 2 corresponds to the variability within the 
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pattern’s spatial period in locating the position of 
the most salient contrast difference, x 0 . 

25. The method of claim 24 and additionally includ- 
ing the step of: 

discriminating the direction an object moves relative 
to the background frame of reference at two differ- 
ent times whenever the sums and differences of 
outputs from the paired filters pooled across the 
background frame of reference exceeds a pre-estab- 
lished threshold. 

26. The method of claim 24 wherein additionally: 

a) the low-pass temporal filter Fes is described by, 


x-xgB/2 kjCb » + kjb _ 
x=x a -B/2 CV> + C 0 ES 


k{Ct n 4- kfi 

"cTTcT 


F f Q S at time 11 


b) and the bandpass temporal filter F os is described 
by, 


x=x a +B/2 kjCb n -| -kfi kjC£ + kfi 

x=x 0 —B/2 C b " + C 0 ES ~ C," + C 0 


f f Q S at time 


, where Q corresponds to the contrast of a test 
pattern, k/ ; corresponds to the contrast threshold 
for a test frequency, C b corresponds to the contrast 
of the pattern of the background frame of refer- 
ence, k fb corresponds to the contrast threshold for 
the frequencies of the background frame of refer- 
ence, C 0 , that depends on the temporal frequency, 
is a constant that corresponds to the signal-to-noise 
ratio used when detecting left-right movement of a 
test pattern relative to of the background frame of 
reference, n corresponds to the slope of the con- 
trast response function, m is a constant, B corre- 
sponds to the spatial period of the background 
frame of reference, \ 0 corresponds to a zero-cross- 
ing or contrast difference used as the point of refer- 
ence to judge the direction the test pattern moved 
between two pattern presentations at times ti and 
t 2 , and k„ k/are changeable constants for the gain of 
the contrast sensitivity which may be changed by 
feedback. 

27. The method of claim 20 and additionally includ- 
ing the steps of: 

a) filtering the binary signal from the digital buffer at 
times ti and t 2 to produce enhanced 2-dimensional 
binary representations of the area of interest at the 
times; 

b) producing smoothed versions of the enhanced 
2-dimensional binary representations; 

c) processing the enhanced 2-dimensional binary rep- 
resentations to provided the spatial gradients of 
components thereof comprising a background 
frame of reference and components identified with 
objects moving against the background frame of 
reference; 

d) producing the temporal gradients of the enhanced 
2-dimensional binary representations; and, 

e) comparing the spatial and temporal gradients at 
times ti and t 2 to one another to determine any 
motion parallax existing in the area of interest, 
whereby movement of objects in the area of inter- 
est is determined. 

***** 



